Goto

Collaborating Authors

 xk 1






c336346c777707e09cab2a3c79174d90-Supplemental.pdf

Neural Information Processing Systems

We also establish new convergence complexities to achieve an approximate KKT solution when the objective can be smooth/nonsmooth, deterministic/stochastic and convex/nonconvex with complexity that is on a par with gradient descent for unconstrained optimization problems in respective cases. To the best of our knowledge, this is the first study of the first-order methods with complexity guarantee for nonconvex sparse-constrained problems.



PathSample-AnalyticGradientEstimators forStochasticBinaryNetworks

Neural Information Processing Systems

We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional modelswithbothproposedmethods.



SC-OGM[63] xk+1=x+k + κ 1 8κ+1+2+κ (x+k x+k 1)

Neural Information Processing Systems

ProofofObservation4. Figure 2 (middle) depicts the plane of iteration of TMM. Now, we complete the proof by showing that{Uk}Kk=0 is nonincreasing. Optimality condition for strongly convex function implies that there existu g(x) such that f(x)+u+L(x x)=0. Linear coupling [4] interprets acceleration as a unification of gradient descent and mirror descent. The auxiliary iterates ofour setup are referred toasthe mirror descent iterates inthe linear coupling viewpoint.


ReLIZO: SampleReusableLinearInterpolation-based Zeroth-orderOptimization

Neural Information Processing Systems

Wherein, the first step, i.e. gradient estimation, is critical since it provides the essential direction to update variables, which have been explored by many recent works[5,30].